Getting different responses! 🤔

Answered

Dwarf Crocodile posted this in #help-forum

Dwarf CrocodileOP

2024-10-28T15:49:15.759Z

So, earlier, I was doing data fetching from S3 directly on frontend:

const params = {
        Bucket: "front-end1",
        Key: selectedOption,
      };

const data = await s3.getObject(params).promise();

console.log("ensemble_data----", data);

And then I would do parsing or other processing of this 'data'
(I have fetched the raw data here I guess, I am using "data.Body" later in data processing like:

const workbook = XLSX.read(data.Body, { type: "array" });

)

But, to make it more secure, I moved my data fetching from S3 to backend.

Flask API endpoint:

@app.route('/api/ensemble/<selectedOption>', methods=['GET'])
@jwt_required()
def get_ensemble_data(selectedOption):
    current_user = get_jwt_identity()
    if current_user['role'] not in ['admin', 'client', 'employee']:
        return jsonify({'message': 'Unauthorized'}), 403

    try:
        response = s3.get_object(Bucket="front-end1", Key=selectedOption)
        file_content = response['Body'].read()

         return Response({"ensembleData": file_content})
    except Exception as e:
        return jsonify({"error": str(e)}), 500

Frontend Code:

const response = await axios.get(
          `${baseUrl}/api/ensemble/${selectedOption}`,
          {
            headers: {
              Authorization: `Bearer ${localStorage.getItem("token")}`,
            },
          }
        );
        const data = response.data.ensembleData;
        console.log("ensemble_data----", data);

And then I would do exact same parsing or other processing of this 'data' too.

But the thing is, these "data" values are not same. I am attaching photos of console logs for both.

1st pic: Old Code
2nd pic: New Code

Answered by B33fb0n3

you shouldn't store data inside an excel file and parse and do whatever to retrieve it. s3 is a service to store files. Not data. It's also not the place to serve it efficiently. Of course you can do all this, but you will either get bugs (like you see now) or you get a huge bill (you might see this in the future).

Store data where it want to be stored and that's inside a database. This data can be efficiently created, read, updated and deleted (CRUD).

So create a database and put your data inside there. Then use a ORM like drizzle to do CRUD operations.
AWS itself has a service for databases as well ("Amazon RDS"): https://aws.amazon.com/de/rds/postgresql/

It can also be seamless integrated to drizzle: https://orm.drizzle.team/docs/connect-aws-data-api-pg

View full answer

16 Replies

@Dwarf Crocodile So, earlier, I was doing data fetching from S3 directly on frontend: javascript const params = { Bucket: "front-end1", Key: selectedOption, }; const data = await s3.getObject(params).promise(); console.log("ensemble_data----", data); And then I would do parsing or other processing of this 'data' (I have fetched the raw data here I guess, I am using "data.Body" later in data processing like: javascript const workbook = XLSX.read(data.Body, { type: "array" }); ) But, to make it more secure, I moved my data fetching from S3 to backend. Flask API endpoint: python @app.route('/api/ensemble/<selectedOption>', methods=['GET']) @jwt_required() def get_ensemble_data(selectedOption): current_user = get_jwt_identity() if current_user['role'] not in ['admin', 'client', 'employee']: return jsonify({'message': 'Unauthorized'}), 403 try: response = s3.get_object(Bucket="front-end1", Key=selectedOption) file_content = response['Body'].read() return Response({"ensembleData": file_content}) except Exception as e: return jsonify({"error": str(e)}), 500 Frontend Code: javascript const response = await axios.get( `${baseUrl}/api/ensemble/${selectedOption}`, { headers: { Authorization: `Bearer ${localStorage.getItem("token")}`, }, } ); const data = response.data.ensembleData; console.log("ensemble_data----", data); And then I would do exact same parsing or other processing of this 'data' too. But the thing is, these "data" values are not same. I am attaching photos of console logs for both. 1st pic: Old Code 2nd pic: New Code

B33fb0n3

2024-10-28T16:19:51.833Z

Answer

Dwarf CrocodileOP

2024-10-28T16:30:37.064Z

actually, i am not storing the data, there are files already present.
And there are a total of 50+ files. Each very long.
But have same format.
They are basically in .csv and .xlsx format.

So, i am fetching specific file based on user's selected option in frontend.
And then I am mapping them and doing other processing.

So i need those files to be on S3 not in a DB

@Dwarf Crocodile actually, i am not storing the data, there are files already present. And there are a total of 50+ files. Each very long. But have same format. They are basically in .csv and .xlsx format. So, i am fetching specific file based on user's selected option in frontend. And then I am mapping them and doing other processing. So i need those files to be on S3 not in a DB

B33fb0n3

2024-10-28T16:33:15.982Z

well... data is data. So get the data out of your excels and into a DB. That's the only solution I want to give you and yes, I am talking from experience about those problems, that you will face:

Of course you can do all this, but you will either get bugs (like you see now) or you get a huge bill (you might see this in the future)

@B33fb0n3 well... data is data. So get the data out of your excels and into a DB. That's the only solution I want to give you and yes, I am talking from experience about those problems, that you will face: > Of course you can do all this, but you will either **get bugs** (like you see now) or you **get a huge bill** (you might see this in the future)

Dwarf CrocodileOP

2024-10-28T17:13:33.059Z

Ok, will try to transfer the data to DB, but probably next month. Right now, many more files are being generated in sagemaker and transfering them to S3. Once that is done, will try to transfer.

Dwarf CrocodileOP

2024-10-28T17:14:16.643Z

But, for the time being, can you help whats causing this problem?

@Dwarf Crocodile Ok, will try to transfer the data to DB, but probably next month. Right now, many more files are being generated in sagemaker and transfering them to S3. Once that is done, will try to transfer.

B33fb0n3

2024-10-28T17:35:52.951Z

that sounds great. I am pretty sure, that sagemaker also offers a way to directly export it to your RDS database as both are services from aws

@B33fb0n3 that sounds great. I am pretty sure, that sagemaker also offers a way to directly export it to your RDS database as both are services from aws

Dwarf CrocodileOP

2024-10-28T17:55:26.001Z

wow, are there any tutorials or docs that I can follow?

and I think i will be able to read and process/parse the data much faster if the data is in DATABASE than fetching it from S3

and maybe use cloudfront to make it even more fast?

@Dwarf Crocodile wow, are there any tutorials or docs that I can follow?

B33fb0n3

2024-10-28T17:57:57.419Z

I just checked the docs and it doesn't look like you can export it to RDS 😦

@Dwarf Crocodile and maybe use cloudfront to make it even more fast?

B33fb0n3

2024-10-28T17:58:43.321Z

you are right: cloudfront is a CDN and will serve your data. When directly serving it from the origin (your s3) it can get very expensive

@B33fb0n3 I just checked the docs and it doesn't look like you can export it to RDS 😦

Dwarf CrocodileOP

2024-10-29T10:24:55.952Z

Oh..no worries.

@B33fb0n3 you are right: cloudfront is a CDN and will serve your data. When directly serving it from the origin (your s3) it can get very expensive

Dwarf CrocodileOP

2024-10-29T10:34:57.360Z

My webpage takes a bit of time when it first loads a protein structure.

The thing is:
Every protein has a .csv, .pdb and some other .xlsx files on S3.
A user selects a protein from dropdown, and then I fetch all the files of that protein.

These are large files (for some bigger proteins, they are about 5k+ lines).

Protein structures are generated by data coming from .pdb files.
Atom level summaries when a user clicks on specific atom in 3d structure is coming from .csv file. And so on.

How can I improve my website's latency? and overall performance and speed?

@Dwarf Crocodile My webpage takes a bit of time when it first loads a protein structure. The thing is: Every protein has a `.csv`, `.pdb` and some other `.xlsx` files on S3. A user selects a protein from dropdown, and then I fetch all the files of that protein. These are large files (for some bigger proteins, they are about 5k+ lines). Protein structures are generated by data coming from `.pdb` files. Atom level summaries when a user clicks on specific atom in 3d structure is coming from `.csv` file. And so on. How can I improve my website's latency? and overall performance and speed?

B33fb0n3

2024-10-29T11:15:40.860Z

you are experiencing one of many problems of saving data inside a file: the whole file need to be downloaded. And that takes time. So instead of downloading the whole file, you can only fetch the data, that you really need.

And now we are on the Database side: use a database to fetch only the data that you need.

B33fb0n3

2024-10-31T07:29:46.114Z

@Dwarf Crocodile solved?

@B33fb0n3 you are experiencing one of many problems of saving data inside a file: the whole file need to be downloaded. And that takes time. So instead of downloading the whole file, you can only fetch the data, that you really need. And now we are on the Database side: use a database to fetch only the data that you need.

Dwarf CrocodileOP

2024-11-05T10:51:52.089Z

For some .csv or .pdb files, i need to fetch the whole data, however long it is, so for that?